智能论文笔记

ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

Metin Ersin Arican , Ozgur Kara , Gustav Bredell , Ender Konukoglu

分类：计算机视觉

2021-11-27

最近的作品表明，卷积神经网络（CNN）架构具有朝向较低频率的光谱偏压，这已经针对在之前（DIP）框架中的深度图像中的各种图像恢复任务而被利用。归纳偏置的益处网络施加在DIP框架中取决于架构。因此，研究人员研究了如何自动化搜索来确定最佳性能的模型。然而，常见的神经结构搜索（NAS）技术是资源和时间密集的。此外，最佳性能的模型是针对整个图像的整个数据集而不是为每个图像独立地确定，这将是非常昂贵的。在这项工作中，我们首先表明DIP框架中的最佳神经结构是依赖于图像的。然后利用这种洞察力，我们提出了一种特定于DIP框架的图像特定的NAS策略，其需要比典型的NAS方法大得多，有效地实现特定于图像的NA。对于给定的图像，噪声被馈送到大量未训练的CNN，并且它们的输出的功率谱密度（PSD）与使用各种度量的损坏图像进行比较。基于此，选择并培训了一个小型的图像特定架构，以重建损坏的图像。在这种队列中，选择重建最接近重建图像的平均值的模型作为最终模型。我们向拟议的战略证明（1）证明其在NAS数据集上的表现效果，该数据集包括来自特定搜索空间（2）的500多种模型，在特定的搜索空间（2）上进行了广泛的图像去噪，染色和超级分辨率任务。我们的实验表明，图像特定度量可以将搜索空间减少到小型模型队列，其中最佳模型优于电流NAS用于图像恢复的方法。

translated by 谷歌翻译

A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition

Gürkan Soykan , Deniz Yuret , Tevfik Metin Sezgin

分类：自然语言处理 | 人工智能

2022-12-27

This study focuses on improving the optical character recognition (OCR) data for panels in the COMICS dataset, the largest dataset containing text and images from comic books. To do this, we developed a pipeline for OCR processing and labeling of comic books and created the first text detection and recognition datasets for western comics, called "COMICS Text+: Detection" and "COMICS Text+: Recognition". We evaluated the performance of state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS. We also created a new dataset called "COMICS Text+", which contains the extracted text from the textboxes in the COMICS dataset. Using the improved text data of COMICS Text+ in the comics processing model from resulted in state-of-the-art performance on cloze-style tasks without changing the model architecture. The COMICS Text+ dataset can be a valuable resource for researchers working on tasks including text detection, recognition, and high-level processing of comics, such as narrative understanding, character relations, and story generation. All the data and inference instructions can be accessed in https://github.com/gsoykan/comics_text_plus.

translated by 谷歌翻译

A transformer-based deep learning approach for classifying brain metastases into primary organ sites using clinical whole brain MRI

Qing Lyu , Sanjeev V. Namjoshi , Emory McTyre , Umit Topaloglu , Richard Barcus , Michael D. Chan , Christina K. Cramer , Waldemar Debinski , Metin N. Gurcan , Glenn J. Lesser

分类：计算机视觉

2021-10-07

脑转移性疾病的治疗决策依赖于主要器官位点的知识，目前用活组织检查和组织学进行。在这里，我们开发了一种具有全脑MRI数据的准确非侵入性数字组织学的新型深度学习方法。我们的IRB批准的单网回顾性研究由患者（n = 1,399）组成，提及MRI治疗规划和伽马刀放射牢房超过19年。对比增强的T1加权和T2加权流体减毒的反转恢复脑MRI考试（n = 1,582）被预处理，并输入肿瘤细分，模态转移和主要部位分类的建议深度学习工作流程为五个课程之一（肺，乳腺，黑色素瘤，肾等）。十倍的交叉验证产生的总体AUC为0.947（95％CI：0.938,0.955），肺类AUC，0.899（95％CI：0.884,0.915），乳房类AUC为0.990（95％CI：0.983,0.997），黑色素瘤ACAC为0.882（95％CI：0.858,0.906），肾类AUC为0.870（95％CI：0.823,0.918），以及0.885的其他AUC（95％CI：0.843,0.949）。这些数据确定全脑成像特征是判别的，以便准确诊断恶性肿瘤的主要器官位点。我们的端到端深度射出方法具有巨大的分类来自全脑MRI图像的转移性肿瘤类型。进一步的细化可以提供一种无价的临床工具，以加快对精密治疗和改进的结果的原发性癌症现场鉴定。

translated by 谷歌翻译

Exploring Adversarial Robustness of Multi-Sensor Perception Systems in Self Driving

James Tu , Huichen Li , Xinchen Yan , Mengye Ren , Yun Chen , Ming Liang , Eilyan Bitar , Ersin Yumer , Raquel Urtasun

分类：计算机视觉 | 机器学习

2021-01-17

已经证明了现代自动驾驶感知系统在处理互补输入之类的利用图像时，已被证明可以改善互补投入。在孤立中，已发现2D图像非常容易受到对抗性攻击的影响。然而，有有限的研究与图像特征融合的多模态模型的对抗鲁棒性。此外，现有的作品不考虑跨输入方式一致的物理上可实现的扰动。在本文中，我们通过将对抗物体放在主车辆的顶部上展示多传感器检测的实际敏感性。我们专注于身体上可实现的和输入 - 不可行的攻击，因为它们是在实践中执行的可行性，并且表明单个通用对手可以隐藏来自最先进的多模态探测器的不同主机。我们的实验表明，成功的攻击主要是由易于损坏的图像特征引起的。此外，我们发现，在将图像特征中的现代传感器融合方法中，对抗攻击可以利用投影过程来在3D中跨越区域产生误报。朝着更强大的多模态感知系统，我们表明，具有特征剥夺的对抗训练可以显着提高对这种攻击的鲁棒性。然而，我们发现标准的对抗性防御仍然努力防止由3D LIDAR点和2D像素之间不准确的关联引起的误报。

translated by 谷歌翻译

UPSNet: A Unified Panoptic Segmentation Network

Yuwen Xiong , Renjie Liao , Hengshuang Zhao , Rui Hu , Min Bai , Ersin Yumer , Raquel Urtasun

分类：

2019-01-12

In this paper, we propose a unified panoptic segmentation network (UPSNet) for tackling the newly proposed panoptic segmentation task. On top of a single backbone residual network, we first design a deformable convolution based semantic segmentation head and a Mask R-CNN style instance segmentation head which solve these two subtasks simultaneously. More importantly, we introduce a parameter-free panoptic head which solves the panoptic segmentation via pixel-wise classification. It first leverages the logits from the previous two heads and then innovatively expands the representation for enabling prediction of an extra unknown class which helps better resolve the conflicts between semantic and instance segmentation. Additionally, it handles the challenge caused by the varying number of instances and permits back propagation to the bottom modules in an end-to-end manner. Extensive experimental results on Cityscapes, COCO and our internal dataset demonstrate that our UPSNet achieves stateof-the-art performance with much faster inference. Code has been made available at: https://github.com/ uber-research/UPSNet. * Equal contribution.† This work was done when Hengshuang Zhao was an intern at Uber ATG.

translated by 谷歌翻译